Fast Dynamic Multiset Membership Testing Using Combinatorial Bloom Filters
نویسندگان
چکیده
In this paper we consider the problem of designing a data structure that can perform fast multiset membership testing in deterministic time. Our primary goal is to develop a hardware implementation of the data structure which uses only embedded memory blocks. Prior efforts to solve this problem involve hashing into multiple Bloom filters. Such approach needs a priori knowledge of the number of elements in each set in order to size the Bloom filter. We use a single Bloom filter based approach and use multiple sets of hash functions to code for the set (group) id. Since a single Bloom filter is used, it does not need a priori knowledge of the distribution of the elements across the different sets. We show how to improve the performance of the data structure by using constant weight error correcting codes for coding the group id. Using error correcting codes improves the performance of these data structures especially when there are large number of sets. We also outline an efficient hardware based approach to generate the the large number of hash functions that we need for this data structure. The resulting data structure, COMB, is amenable to a variety of time-critical network applications.
منابع مشابه
Two-tier Bloom filter to achieve faster membership testing
Introduction: Bloom filters [1] are a space-efficient, probabilistic data structure for representing a list of elements (for example, a list of strings). A Bloom filter is an array of m bits. A string is mapped into a Bloom filter by inputting it to a group of k hash functions resulting in k array positions. Each indexed array position is set to 1. A string is tested for membership by inputting...
متن کاملNAE-SAT-based probabilistic membership filters
Probabilistic membership filters are a type of data structure designed to quickly verify whether an element of a large data set belongs to a subset of the data. While false negatives are not possible, false positives are. Therefore, the main goal of any good probabilistic membership filter is to have a small false-positive rate while being memory efficient and fast to query. Although Bloom filt...
متن کاملPersistent Bloom Filter: Membership Testing for the Entire History
Membership testing is the problem of testing whether an element is in a set of elements. Performing the test exactly is expensive space-wise, requiring the storage of all elements in a set. In many applications, an approximate testing that can be done quickly using small space is often desired. Bloom filter (BF) was designed and has witnessed great success across numerous application domains. B...
متن کاملBioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters
Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by a data structure that supports time- and memory-efficient set membership queries. Bloom filters offer such queries but require that false positives be controlled. We present BioBloom Tools, a Bloom filter-based sequence-screening tool that is faster than BWA, Bowtie 2 (popular ali...
متن کاملBloom Filters & Their Applications
A Bloom Filter (BF) is a data structure suitable for performing set membership queries very efficiently. A Standard Bloom Filter representing a set of n elements is generated by an array of m bits and uses k independent hash functions. Bloom Filters have some attractive properties including low storage requirement, fast membership checking and no false negatives. False positives are possible bu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009